An Efficient Text Summarizer using Lexical Chains
نویسندگان
چکیده
We present a system which uses lexical chains as an intermediate representation for automatic text summarization. This system builds on previous research by implementing a lexical chain extraction algorithm in linear time. The system is reasonably domain independent and takes as input any text or HTML document. The system outputs a short summary based on the most salient concepts from the original document. The length of the extracted summary can be either controlled automatically, or manually based on length or percentage of compression. While still under development, the system provides useful summaries which compare well in information content to human generated summaries. Additionally, the system provides a robust test bed for future summary generation research. 1 I n t r o d u c t i o n Automatic text summarization has long been viewed as a two-step process. First, an intermediate representation of the summary must be created. Second, a natural language representation of the summary must be generated using the intermediate representation(Sparek Jones, 1993). Much of the early research in automatic text summarization has involved generation of the intermediate representation. The natural language generation problem has only recently received substantial attention in the context of summarization.
منابع مشابه
Text Summarization Using Lexical Chains
Text summarization addresses both the problem of selecting the most important portions of text and the problem of generating coherent summaries. We present in this paper the summarizer of the University of Lethbridge at DUC 2001, which is based on an efficient use of lexical chains.
متن کاملAn EÆcient Text Summarizer Using Lexical Chains
We present a system which uses lexical chains as an intermediate representation for automatic text summarization. This system builds on previous research by implementing a lexical chain extraction algorithm in linear time. The system is reasonably domain independent and takes as input any text or HTML document. The system outputs a short summary based on the most salient concepts from the origi...
متن کاملIS_SUM: A Multi-Document Summarizer based on Document Index Graphic and Lexical Chains
IS_SUM is a summarizer developed at Institute of Software (IS) of Chinese Academy of Sciences for DUC 2005. We adopt a new way for clustering and summarizing documents by integrating Document Index Graphic (DIG) [7] with Lexical Chains [5]. Our results show the benefit of integrating DIG with Lexical Chains.
متن کاملCohesion and coherence for Automatic Summarization
This paper presents the integration of cohesive properties of text with coherence relations, to obtain an adequate representation of text for automatic summarization. A summarizer based on Lexical Chains is enchanced with rhetorical and argumentative structure obtained via Discourse Markers. When evaluated with newspaper corpus, this integration yields only slight improvement in the resulting s...
متن کاملIntegrating cohesion and coherence for Automatic Summarization
This paper presents the integration of cohesive properties of text with coherence relations, to obtain an adequate representation of text for automatic summarization. A summarizer based on Lexical Chains is enchanced with rhetorical and argumentative structure obtained via Discourse Markers. When evaluated with newspaper corpus, this integration yields only slight improvement in the resulting s...
متن کامل